A Generalized Parallel Algorithm for Frequent Itemset Mining
نویسندگان
چکیده
A parallel algorithm for finding the frequent itemsets in a set of transactions is presented. The frequent individual items are identified by their index. We assume that processors number (m) is less than the frequent items number (n). At the first stage, every processor Pi, i ∈ {1, . . . ,m − 1} sequentially computes the frequent itemsets from the interval Ii = [(i − 1) · p + 1, i · p], where p = ⌊ n m⌋. The processor Pm computes frequent itemsets from the interval Im = [(m − 1) · p + 1, n]. In the second stage, the parallel algorithm is applied . The processor Pi computes, step by step, the sets FIi,Ij of the frequent itemsets with individual items from the intervals Ii,j = Ii∪Ii+1∪. . .∪Ij , j = i+1, . . . ,m. In order to compute the set FIi,Ij , the processor Pi uses FIi,Ij−1 obtained in the previous step and FIi+1,Ij received from the processor Pi+1. The main advantage of our parallel algorithm is that it uses a communication pattern known before algorithm start, which permits to map the communication to hardware. Another major advantage is that the set of the transactions can be distributed to processors before the beginning of the algorithm. This is possible because a processor Pi has to compute FIi,Ij , j = i + 1, . . . ,m and therefore only the transactions containing the frequent items starting with Ii are needed. Key–Words: Data Mining, Association Rule Discovery, Frequent Itemset Mining, Parallel Algorithms
منابع مشابه
A New Algorithm for High Average-utility Itemset Mining
High utility itemset mining (HUIM) is a new emerging field in data mining which has gained growing interest due to its various applications. The goal of this problem is to discover all itemsets whose utility exceeds minimum threshold. The basic HUIM problem does not consider length of itemsets in its utility measurement and utility values tend to become higher for itemsets containing more items...
متن کاملAccelerating Parallel Frequent Itemset Mining on Graphics Processors with Sorting
Frequent Itemset Mining (FIM) is one of the most investigated fields of data mining. The goal of Frequent Itemset Mining (FIM) is to find the most frequently-occurring subsets from the transactions within a database. Many methods have been proposed to solve this problem, and the Apriori algorithm is one of the best known methods for frequent Itemset mining (FIM) in a transactional database. In ...
متن کاملNew Parallel Algorithms for Frequent Itemset Mining in Very Large Databases
Frequent itemset mining is a classic problem in data mining. It is a non-supervised process which concerns in finding frequent patterns (or itemsets) hidden in large volumes of data in order to produce compact summaries or models of the database. These models are typically used to generate association rules, but recently they have also been used in far reaching domains like e-commerce and bio-i...
متن کاملAn Accelerator for Frequent Itemset Mining from Data Streams with Parallel Item Tree
Frequent itemset mining attempts to find frequent subsets in a transaction database. In this era of big data, demand for frequent itemset mining is increasing. Therefore, the combination of fast implementation and low memory consumption, especially for stream data, is needed. In response to this, we optimize an online algorithm, called Skip LC-SS algorithm [1], for hardware. In this paper, we p...
متن کاملWeighted Itemset Mining from Bigdata using Hadoop
Data items have been extracted using an empirical data mining technique called frequent itemset mining. In majority of theapplication contexts items are enriched with weights. Pushing an item weights into the itemset extraction process, i.e., mining weighted itemsets rather than traditional itemsets, is an appealing research direction. Although many efficient weighteditemset mining algorithms a...
متن کامل